Skip to content

Comments

Round-trip Testing#187

Open
Nikil-Shyamsunder wants to merge 20 commits intomainfrom
roundtrip-test-turnt
Open

Round-trip Testing#187
Nikil-Shyamsunder wants to merge 20 commits intomainfrom
roundtrip-test-turnt

Conversation

@Nikil-Shyamsunder
Copy link
Collaborator

@Nikil-Shyamsunder Nikil-Shyamsunder commented Feb 15, 2026

This PR focuses on infrastructure, not semantic bug fixes.

We now integrate roundtrip checks with Turnt instead of running only a standalone script.

  • Added a Turnt roundtrip environment for roundtrip
  • Roundtrip command runs per-.tx file via: scripts/roundtrip_case.py {filename}
  • Added a convenience entry in justfile to run roundtrip through Turnt.
  • CI now runs roundtrip tests

For each .tx file:

  1. Parse // ARGS and // RETURN metadata.
  2. Skip tests with non-zero // RETURN (recorded as skip).
  3. For each trace in the .tx file, run interpreter to generate waveform
  4. Run monitor on generated waveform using the same protocol context.
  5. Compare expected trace block from .tx against monitor-emitted trace block(s) after normalization (get rid of comments, whitespace etc).
  6. Emit success/skipped/failure messages

Turnt snapshots roundtrip stdout into .rt files

Updated the turnt output to also have the expected (monitor) traces and the actual (interpreter) trace. so for a succeeding test, it might look like:

trace_block: 0
trace_result: PASS
matched_monitor_trace_index: 0
interpreter_trace:
trace {
    multiple_assign(1, 1);
    multiple_assign(1, 1);
}

parsed_monitor_trace_candidates:
candidate_monitor_trace: 0
trace {
    multiple_assign(1, 1);
    multiple_assign(1, 1);
}

candidate_monitor_trace: 1
trace {
    multiple_assign(1, 1);
    two_fork_err(1, 1);
}

This is good for passing examples, because we can check if there is any change in interpreter or monitor behavior that changes how many or what traces show up even for passing tests, instead of a blanket "pass".

A roundtrip trace might fail like this due to a monitor error:

trace_block: 0
trace_result: FAIL
failure_kind: monitor_error
interpreter_trace:
trace {
    add(1, 2, 3);
    add(4, 5, 9);
}

monitor_error:
All schedulers failed: No transactions match the waveform for DUT `Adder`
Failure at cycle 1: No transactions match the waveform in `.roundtrip_tmp/adders-adder_d1-both_threads_pass_0.fst`.
Possible transactions: [add, add_fork_early, add_doesnt_end_in_step, add_incorrect, add_incorrect_implicit, wait_and_add]
Error: Monitor failed

Or it might fail where the actual trace is not in the monitor output, like this:

trace_block: 0
trace_result: FAIL
failure_kind: trace_mismatch
interpreter_trace:
trace {
    reset();
    push(2);
    pop(2);
    idle();
    push(3);
    pop(3);
}

parsed_monitor_trace_candidates:
candidate_monitor_trace: 0
trace {
    reset();
    push(2);
    pop(2);
}

monitor failures due to panics are sanitized to get rid of machine-specific thread numbers (observed this on the GitHub actions CI machine) and because the line numbers in the panicked at line will change constantly.

we might also consider putting in a low-priority issue that some of this can be ported into rust (I am thinking specifically of the logic for generating a trace, passing it to the monitor, comparing the resulting traces, etc.) so that we have native support instead of relying on janky string processing.

@ngernest
Copy link
Contributor

ngernest commented Feb 16, 2026

Looks good to me, thanks for working on this! I just updated the turnt.toml file to use uv run instead of python3 to invoke the round-tripping script (to maintain consistency with the other Python scripts in the repo).

I wonder if we should try to resolve all the failing round-trip tests in this PR or save them for another PR -- thoughts?

@Nikil-Shyamsunder
Copy link
Collaborator Author

I wonder if we should try to resolve all the failing round-trip tests in this PR or save them for another PR -- thoughts?

I think there are too many disparate non-trivial fixes to have them all in one PR.

@ngernest
Copy link
Contributor

Gotcha, I think Kevin @ekiwi normally prefers for all CI checks to be passing before merging a PR, I'll defer to him re: whether we can leave some round-trip tests failing when we merge this PR (i.e. fix them in a separate future PR).

@ngernest
Copy link
Contributor

This looks really good! Thanks for updating this so quickly! I'm happy with this

@Nikil-Shyamsunder
Copy link
Collaborator Author

Nikil-Shyamsunder commented Feb 17, 2026

Gotcha, I think Kevin @ekiwi normally prefers for all CI checks to be passing before merging a PR, I'll defer to him re: whether we can leave some round-trip tests failing when we merge this PR (i.e. fix them in a separate future PR).

Agreed. the check will "pass" now but it will be passing with some examples having trace_result: FAIL that we can either resolve before merging this or resolve later. Agree to defer to kevin. Thanks so much for the great feedback on this review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Ernest: check why the monitor is failing here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Ernest: check why monitor is failing here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Ernest: investigate this (b ought to be defined)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for Ernest: this is probably a off-by-one error (check the waveform manually, it should have length 2)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If waveform has length 3, there's a bug w/ interpreter
  • If waveform has length 2, there's a bug w/ monitor

@ngernest
Copy link
Contributor

Context for @Nikil-Shyamsunder: Kevin and I discussed the failing round-trip tests today in-person, I left comments above on the ones that I need to investigate further (to see if they're bugs with the monitor). Kevin says we should wait till these failing tests are fixed before merging this PR if that's OK!

@Nikil-Shyamsunder
Copy link
Collaborator Author

sounds like a plan!

@ngernest
Copy link
Contributor

ngernest commented Feb 23, 2026

Added a new CLI flag --allow-round-trip-failure to roundtrip_case.py, which allows the user to mark an individual .tx file as allowed to fail in round-trip mode.

I added this flag to all the .tx files which call protocols that involve repeat ... loops, since those aren't supported in the monitor for now. (I will continue to fix the monitor for the other failing round-trip tests.)

Note: because of how Turnt's overrides work (the same .tx file is used as an input for both the interp and roundtrip environments in Turnt), I also needed to add --allow-round-trip-failure as a "dummy" CLI argument to the Rust interpreter executable (which just ignores this flag).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants